Improved Speech Recognition using Adaptive Audio-visual Fusion via a Stochastic Secondary Classifier

نویسندگان

  • Simon Lucey
  • Sridha Sridharan
  • Vinod Chandran
چکیده

The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classijier on the word likelihood scores from both the audio and video modalities is investigated fo r the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion across a number of audio noise

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving visual noise insensitivity in small vocabulary audio visual speech recognition applications

Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. In this paper the use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Prelimin...

متن کامل

Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities

An audio-visual speaker identification system is described, where the audio and visual speech modalities are fused by an automatic unsupervised process that adapts to local classifier performance, by taking into account the output score based reliability estimates of both modalities. Previously reported methods do not consider that both the audio and the visual modalities can be degraded. The v...

متن کامل

Keyword Spotting Based On Decision Fusion

Automatic speech recognition (ASR) technology is available now-a-days in all handsets where keyword spotting plays a vital role. Keyword spotting performance significantly degrades when applied to real-world environment due to background noise. As visual features are not affected much by noise this provides better solution. In this paper, audio-visual integration is proposed which combines audi...

متن کامل

Two-Level Bimodal Association for Audio-Visual Speech Recognition

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second...

متن کامل

An investigation of HMM classifier combination strategies for improved audio-visual speech recognition

The combining of independent audio and visual HMM classifiers (late integration) has been shown to out perform the combination of audio and visual features in a single HMM classifier (early integration) when either or both modalities are presented with distortion for the task of speech recognition. Theoretical foundations for the optimal combination of these audio and video classifiers are stil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004